find something interesting

Many of the Abila, Kronos-based employees of GAStech have company cars which are approved for both personal and business use. Those who do not have company cars have the ability to check out company trucks for business use, but these trucks cannot be used for personal business.
Employees with company cars are happy to have these vehicles, because the company cars are generally much higher quality than the cars they would be able to afford otherwise. However, GAStech does not trust their employees. Without the employees? knowledge, GAStech has installed geospatial tracking software in the company vehicles. The vehicles are tracked periodically as long as they are moving.
This vehicle tracking data has been made available to law enforcement to support their investigation. Unfortunately, data is not available for the day the GAStech employees went missing. Data is only available for the two weeks prior to the disappearance.
To promote local businesses, Kronos based companies provide a Kronos Kares benefit card to GASTech employees giving them discounts and rewards in exchange for collecting information about their credit card purchases and preferences as recorded on loyalty cards. This data has been made available to investigators in the hopes that it can help resolve the situation. However, Kronos Kares does not collect personal information beyond purchases.
As a visual analytics expert assisting law enforcement, your mission is to identify which GASTech employees made which purchases and identify suspicious patterns of behavior. You must cope with uncertainties that result from missing, conflicting, and imperfect data to make recommendations for further investigation.
Use visual analytics to analyze the available data and develop responses to the questions below. In addition, prepare a video that shows how you used visual analytics to solve this challenge. Submission instructions are available here. Entry forms are available for download below.
Note:
packages = c('igraph', 'tidygraph', 'ggraph', 'visNetwork', 'lubridate', 'clock', 'tidyverse','dplyr', 'tidyr','raster','sf','sp','tmap', 'gifski','mapview','writexl')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
credit_card <- read.csv("data/cc_data.csv")
glimpse(credit_card)
Rows: 1,490
Columns: 4
$ timestamp <chr> "1/6/2014 7:28", "1/6/2014 7:34", "1/6/2014 7:35"~
$ location <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <int> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
loyalty_card <- read.csv("data/loyalty_data.csv")
glimpse(loyalty_card)
Rows: 1,392
Columns: 4
$ timestamp <chr> "1/6/2014", "1/6/2014", "1/6/2014", "1/6/2014", "~
$ location <chr> "Brew've Been Served", "Brew've Been Served", "Ha~
$ price <dbl> 4.17, 9.60, 16.53, 11.51, 12.93, 4.27, 11.20, 15.~
$ loyaltynum <chr> "L2247", "L9406", "L8328", "L6417", "L1107", "L40~
Cause data type of credit card is character which is not correct, so we need to change to date time data type.
credit_card$timestamp <- date_time_parse(credit_card$timestamp,
zone = "",
format = "%m/%d/%Y %H:%M")
glimpse(credit_card)
Rows: 1,490
Columns: 4
$ timestamp <dttm> 2014-01-06 07:28:00, 2014-01-06 07:34:00, 2014-0~
$ location <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <int> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
loyalty_card$timestamp <- date_time_parse(loyalty_card$timestamp,
zone = "",
format = "%m/%d/%Y")
glimpse(loyalty_card)
Rows: 1,392
Columns: 4
$ timestamp <dttm> 2014-01-06, 2014-01-06, 2014-01-06, 2014-01-06, ~
$ location <chr> "Brew've Been Served", "Brew've Been Served", "Ha~
$ price <dbl> 4.17, 9.60, 16.53, 11.51, 12.93, 4.27, 11.20, 15.~
$ loyaltynum <chr> "L2247", "L9406", "L8328", "L6417", "L1107", "L40~
credit_card$Date <- format(credit_card$timestamp, format = "%Y-%m-%d")
credit_card$Date <- date_time_parse(credit_card$Date, zone = "", format = "%Y-%m-%d")
glimpse(credit_card)
Rows: 1,490
Columns: 5
$ timestamp <dttm> 2014-01-06 07:28:00, 2014-01-06 07:34:00, 2014-0~
$ location <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <int> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
$ Date <dttm> 2014-01-06, 2014-01-06, 2014-01-06, 2014-01-06, ~
card_joined <- credit_card %>%
full_join(loyalty_card, by = c("Date" = "timestamp", "location", "price"))
popular_credit_card <- credit_card %>%
group_by(location) %>%
summarise(count = n()) %>%
arrange(desc(count))
popular_loyalty_card <- loyalty_card %>%
group_by(location) %>%
summarise(count = n()) %>%
arrange(desc(count))
popular_top_credit <- popular_credit_card %>%
gather(location, count) %>%
arrange(desc(count)) %>%
top_n(6)
popular_top_credit
# A tibble: 6 x 2
location count
<chr> <int>
1 Katerina's Cafe 212
2 Hippokampos 171
3 Guy's Gyros 158
4 Brew've Been Served 156
5 Hallowed Grounds 92
6 Ouzeri Elian 87
top6_credit<-ggplot(data=popular_top_credit, aes(x=location, y=count)) +
geom_bar(stat="identity", fill="steelblue")+
theme_minimal()
top6_credit

popular_top_loyal <- popular_loyalty_card %>%
gather(location, count) %>%
arrange(desc(count)) %>%
top_n(6)
popular_top_loyal
# A tibble: 6 x 2
location count
<chr> <int>
1 Katerina's Cafe 195
2 Hippokampos 155
3 Guy's Gyros 146
4 Brew've Been Served 140
5 Ouzeri Elian 84
6 Hallowed Grounds 80
top6_loyal<-ggplot(data=popular_top_loyal, aes(x=location, y=count)) +
geom_bar(stat="identity", fill="pink")+
theme_minimal()
top6_loyal

abnormal_credit_card <- popular_locations %>%
drop_na(loyaltynum) %>%
group_by(last4ccnum) %>%
summarize(loy_n = n_distinct(loyaltynum)) %>%
filter(loy_n > 1)
abnormal_credit_card
# A tibble: 7 x 2
last4ccnum loy_n
<int> <int>
1 1286 2
2 4795 2
3 4948 2
4 5368 2
5 5921 2
6 7889 2
7 8332 2
The most popular locations are as below:
What’s more i found something abnormal. Reasonably one credit card transaction should relate to only one loyalty card record for the same amount value, which means who consumed the money and would get the same value of point collection for loyalty card. We can filter out those credit card which have more than one loyalty card record. We can see that those last4ccnum has anomalies being observed, which are 1286, 4795, 4948, 5368, 5921, 7889, and 8332. I will pay more attention to these card when mapping later.
packages = c('raster', 'sf',
'tmap', 'clock',
'tidyverse')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
bgmap <- raster("data/MC2-tourist.tif")
bgmap
class : RasterLayer
band : 1 (of 3 bands)
dimensions : 1595, 2706, 4316070 (nrow, ncol, ncell)
resolution : 3.16216e-05, 3.16216e-05 (x, y)
extent : 24.82419, 24.90976, 36.04499, 36.09543 (xmin, xmax, ymin, ymax)
crs : +proj=longlat +datum=WGS84 +no_defs
source : MC2-tourist.tif
names : MC2.tourist
values : 0, 255 (min, max)
tmap_mode("plot")
tm_shape(bgmap) +
tm_raster(bgmap,
legend.show = FALSE)

tm_shape(bgmap) +
tm_rgb(bgmap,r=1,g=2,b=3,
alpha = NA,
saturation = 1,
interpolate = TRUE,
max.value = 255)

Abila_st <- st_read(dsn = "data/Geospatial",
layer = "Abila")
Reading layer `Abila' from data source
`D:\LLLEMON21\DataViz_blog\_posts\Individual Assignment\data\Geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 3290 features and 9 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: 24.82401 ymin: 36.04502 xmax: 24.90997 ymax: 36.09492
Geodetic CRS: WGS 84
gps2 <- read_csv("data/gps2.csv")
glimpse(gps2)
Rows: 685,169
Columns: 6
$ Timestamp <chr> "1/6/2014 7:20", "1/6/2014 7:20", "1/6/201~
$ id <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
$ lat <dbl> 36.06646, 36.06634, 36.06615, 36.06613, 36~
$ long <dbl> 24.88258, 24.88259, 24.88258, 24.88258, 24~
$ `Time Difference` <time> NA, 00:02:00, 00:03:00, 00:01:00, 0~
$ Seconds <dbl> 0, 2, 3, 1, 3, 1, 1, 1, 4, 1, 1, 2, 3, 1, ~
gps2$Timestamp <- date_time_parse(gps2$Timestamp,
zone = "",
format = "%m/%d/%Y %H:%M")
change data type of “id” to the “id” form to correctly show
gps2$id <- as_factor(gps2$id)
Converting Aspatial Data into a Simple Feature Data Frame
gps_sf <- st_as_sf(gps2,
coords = c("long", "lat"),
crs= 4326)
Add more columns to get features that help to analyze
Manually filter those gps record which had more than 5 minutes time interval
more_than_5mins <- gps_sf %>%
filter(Seconds > 300)
Plotting the gps Paths
gps_path <- gps_sf %>%
group_by(id, hour, day, minute) %>%
summarize(m = mean(Timestamp),
do_union=FALSE) %>%
st_cast("LINESTRING")
Figure out gps path which only got one record and only select gps records which are more than 1, cause we should get path which having one more gps record.
p = npts(gps_path, by_feature = TRUE)
gps_path2 <- cbind(gps_path, p)
gps_path2 <- gps_path2 %>%
filter(p>1)
gps_dot <- more_than_5mins %>%
group_by(id, hour, day, minute) %>%
summarize(geo_n = n_distinct(geometry)) %>%
st_cast("POINT")
card_selected <- card_joined %>%
filter(last4ccnum == 9735)
Take the graph below as an example, after getting the gps dot plot then I get the gps record on the graph. then check with credit card ID purchasing record.The purchasing record a bit earlier than the gps record which can guess this person drive away after paying by credit card.
car <- read_csv("data/car-assignments.csv")
glimpse(car)
Rows: 44
Columns: 5
$ LastName <chr> "Calixto", "Azada", "Balas", "Barranc~
$ FirstName <chr> "Nils", "Lars", "Felix", "Ingrid", "I~
$ CarID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12~
$ CurrentEmploymentType <chr> "Information Technology", "Engineerin~
$ CurrentEmploymentTitle <chr> "IT Helpdesk", "Engineer", "Engineer"~
car <- car %>%
drop_na(CarID)
car$CarID <- as_factor(car$CarID)
glimpse(car)
Rows: 35
Columns: 5
$ LastName <chr> "Calixto", "Azada", "Balas", "Barranc~
$ FirstName <chr> "Nils", "Lars", "Felix", "Ingrid", "I~
$ CarID <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12~
$ CurrentEmploymentType <chr> "Information Technology", "Engineerin~
$ CurrentEmploymentTitle <chr> "IT Helpdesk", "Engineer", "Engineer"~
glimpse(gps2)
Rows: 685,169
Columns: 6
$ Timestamp <dttm> 2014-01-06 07:20:00, 2014-01-06 07:20:00,~
$ id <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
$ lat <dbl> 36.06646, 36.06634, 36.06615, 36.06613, 36~
$ long <dbl> 24.88258, 24.88259, 24.88258, 24.88258, 24~
$ `Time Difference` <time> NA, 00:02:00, 00:03:00, 00:01:00, 0~
$ Seconds <dbl> 0, 2, 3, 1, 3, 1, 1, 1, 4, 1, 1, 2, 3, 1, ~
car_gps <- car %>%
full_join(gps2, by = c("CarID" = "id"))
car_gps <- st_as_sf(car_gps,
coords = c("long", "lat"),
crs= 4326)
car_gps <- car_gps %>%
unite("Name", FirstName, LastName, sep = " ")
abnormal_cc_match <- read.csv("data/abnormal_cc.csv")
abnormal_cc_match
ï..CC_number Loyalty_number Car_ID Name
1 1286 L3572 22 Adra Nubarron
2 1286 L3288 22 Adra Nubarron
3 4795 L8566 34 Edvard Vann
4 4948 L3295 18 Birgitta Frente
5 5921 L9406 29 Bertrand Ovan
6 5921 L3295 29 Bertrand Ovan
7 7889 L6119 8 Lucas Alcazar
8 7889 L2247 8/22/6 -
9 8332 L2070 10 Ada Campo-Corrente
CurrentEmploymentType CurrentEmploymentTitle
1 Security Badging Office
2 Security Badging Office
3 Security Perimeter Control
4 Engineering Geologist
5 Facilities Facilities Group Manager
6 Facilities Facilities Group Manager
7 Information Technology IT Technician
8 - -
9 Executive SVP/CIO
total_match <- read.csv("data/total_match.csv")
total_match
ï..CC_number Loyalty_number Car_ID Name
1 9551 L5777 1 Nils Calixto
2 1415 L7783 2 Lars Azada
3 9635 L3191 3 Felix Balas
4 7688 L4164 4 Ingrid Barranco
5 6899 L6267 5 Isak Baza
6 7253 L1682 6 Linnea Bergen
7 2540 L5947 7 Elsa Orilla
8 1877 L3014 9 Gustav Cazar
9 1311 L4149 11 Axel Calzas
10 7108 L6544 12 Hideki Cocinaro
11 5407 L4034 13 Inga Ferro
12 7819 L5259 13 Inga Ferro
13 1874 L4424 14 Lidelse Dedos
14 3853 L1485 15 Loreto Bodrogi
15 7354 L9254 16 Isia Vann
16 7384 L3800 17 Sven Flecha
17 9617 L5553 18 Birgitta Frente
18 2418 L9018 19 Vira Frente
19 6895 L3366 19 Vira Frente
20 6816 L8148 20 Stenig Fusil
21 9405 L3259 21 Hennie Osvaldo
22 3484 L2490 23 Varja Lagos
23 4434 L2169 24 Minke Mies
24 8202 L2343 24 Minke Mies
25 2142 L9637 25 Kanon Herrero
26 1310 L8012 26 Marin Onda
27 2681 L1107 27 Kare Orilla
28 3492 L7814 27 Kare Orilla
29 9241 L3288 28 Isande Borrasca
30 3547 L9362 29 Bertrand Ovan
31 6691 L6267 29 Bertrand Ovan
32 6901 L9363 30 Felix Resumir
33 5010 L2459 31 Sten Sanjorge Jr.
34 8156 L5224 32 Orhan Strum
35 9683 L7291 33 Brand Tempestad
36 2463 L6886 35 Willem Vasco-Pais
37 3506 L7761 101 #N/A
38 9220 L4063 101 #N/A
39 9614 L5924 101 #N/A
40 8642 L2769 104 #N/A
41 7792 L5756 105 #N/A
42 2276 L3317 106 #N/A
43 4530 L8477 107 #N/A
44 9735 L9633 107 #N/A
CurrentEmploymentType CurrentEmploymentTitle
1 Information Technology IT Helpdesk
2 Engineering Engineer
3 Engineering Engineer
4 Executive SVP/CFO
5 Information Technology IT Technician
6 Information Technology IT Group Manager
7 Engineering Drill Technician
8 Engineering Drill Technician
9 Engineering Hydraulic Technician
10 Security Site Control
11 Security Site Control
12 Security Site Control
13 Engineering Engineering Group Manager
14 Security Site Control
15 Security Perimeter Control
16 Information Technology IT Technician
17 Engineering Geologist
18 Engineering Hydraulic Technician
19 Engineering Hydraulic Technician
20 Security Building Control
21 Security Perimeter Control
22 Security Badging Office
23 Security Perimeter Control
24 Security Perimeter Control
25 Engineering Geologist
26 Engineering Drill Site Manager
27 Engineering Drill Technician
28 Engineering Drill Technician
29 Engineering Drill Technician
30 Facilities Facilities Group Manager
31 Facilities Facilities Group Manager
32 Security Security Group Manager
33 Executive President/CEO
34 Executive SVP/COO
35 Engineering Drill Technician
36 Executive Environmental Safety Advisor
37 #N/A #N/A
38 #N/A #N/A
39 #N/A #N/A
40 #N/A #N/A
41 #N/A #N/A
42 #N/A #N/A
43 #N/A #N/A
44 #N/A #N/A
I combined with car assignment file and inferred the owners of each credit card and loyalty card, as shown above.
i can see from the path above, these three people have the common path very often so gathering a lot, what’s more they often get coffee time in the morning and they are in the same security department, so i guess they have relative close relationship and get along quite well.
these two people are really skeptical cause they went to hotel many times during working time.
i filter out the gps path which got record at 2am, 3am, and 4am. then i found these car ID, 21, 24, 15 got record in this time period which is very skeptical, and the abnormal place are Abila Scrapyard and GAS Tech, i am not sure if they need work that late in the company and cannot figure out the reason why they still outside in the midnight.